506 research outputs found

    Evaluating the financial performance of the socially responsible investment under the bull and the bear market conditions

    Get PDF
    This research is dedicated to detect the effects of the market conditions on the performance of the socially responsible investment. Using the Lunde-Timmermann scheme, the sample period, which is set between February 2002 and June 2011, is classified into period of bull market and period of bear market. Sharpe ratio, Jensen‟s alpha and Treynor‟s ration are employed to compare the performance of the SRI unit trusts and the non-SRI unit trusts in the United Kingdom. Statistically significant differences are found in the bull market supporting that the SRI unit trusts outperform the non-SRI ones. And in the bear market, the SRI unit trusts tend to underperform their non-SRI counterparties. It can be inferred that the different coverage of bull and bear market period is an important factor for the diversity of the previous research results. Analysis of previous researches that covers the UK market generates some proof of the inference of this research

    Learn Goal-Conditioned Policy with Intrinsic Motivation for Deep Reinforcement Learning

    Full text link
    It is of significance for an agent to learn a widely applicable and general-purpose policy that can achieve diverse goals including images and text descriptions. Considering such perceptually-specific goals, the frontier of deep reinforcement learning research is to learn a goal-conditioned policy without hand-crafted rewards. To learn this kind of policy, recent works usually take as the reward the non-parametric distance to a given goal in an explicit embedding space. From a different viewpoint, we propose a novel unsupervised learning approach named goal-conditioned policy with intrinsic motivation (GPIM), which jointly learns both an abstract-level policy and a goal-conditioned policy. The abstract-level policy is conditioned on a latent variable to optimize a discriminator and discovers diverse states that are further rendered into perceptually-specific goals for the goal-conditioned policy. The learned discriminator serves as an intrinsic reward function for the goal-conditioned policy to imitate the trajectory induced by the abstract-level policy. Experiments on various robotic tasks demonstrate the effectiveness and efficiency of our proposed GPIM method which substantially outperforms prior techniques.Comment: Accepted by AAAI-2

    Multidomain transformer-based deep learning for early detection of network intrusion

    Full text link
    Timely response of Network Intrusion Detection Systems (NIDS) is constrained by the flow generation process which requires accumulation of network packets. This paper introduces Multivariate Time Series (MTS) early detection into NIDS to identify malicious flows prior to their arrival at target systems. With this in mind, we first propose a novel feature extractor, Time Series Network Flow Meter (TS-NFM), that represents network flow as MTS with explainable features, and a new benchmark dataset is created using TS-NFM and the meta-data of CICIDS2017, called SCVIC-TS-2022. Additionally, a new deep learning-based early detection model called Multi-Domain Transformer (MDT) is proposed, which incorporates the frequency domain into Transformer. This work further proposes a Multi-Domain Multi-Head Attention (MD-MHA) mechanism to improve the ability of MDT to extract better features. Based on the experimental results, the proposed methodology improves the earliness of the conventional NIDS (i.e., percentage of packets that are used for classification) by 5x10^4 times and duration-based earliness (i.e., percentage of duration of the classified packets of a flow) by a factor of 60, resulting in a 84.1% macro F1 score (31% higher than Transformer) on SCVIC-TS-2022. Additionally, the proposed MDT outperforms the state-of-the-art early detection methods by 5% and 6% on ECG and Wafer datasets, respectively.Comment: 6 pages, 7 figures, 3 tables, IEEE Global Communications Conference (Globecom) 202

    CLUE: Calibrated Latent Guidance for Offline Reinforcement Learning

    Full text link
    Offline reinforcement learning (RL) aims to learn an optimal policy from pre-collected and labeled datasets, which eliminates the time-consuming data collection in online RL. However, offline RL still bears a large burden of specifying/handcrafting extrinsic rewards for each transition in the offline data. As a remedy for the labor-intensive labeling, we propose to endow offline RL tasks with a few expert data and utilize the limited expert data to drive intrinsic rewards, thus eliminating the need for extrinsic rewards. To achieve that, we introduce \textbf{C}alibrated \textbf{L}atent g\textbf{U}idanc\textbf{E} (CLUE), which utilizes a conditional variational auto-encoder to learn a latent space such that intrinsic rewards can be directly qualified over the latent space. CLUE's key idea is to align the intrinsic rewards consistent with the expert intention via enforcing the embeddings of expert data to a calibrated contextual representation. We instantiate the expert-driven intrinsic rewards in sparse-reward offline RL tasks, offline imitation learning (IL) tasks, and unsupervised offline RL tasks. Empirically, we find that CLUE can effectively improve the sparse-reward offline RL performance, outperform the state-of-the-art offline IL baselines, and discover diverse skills from static reward-free offline data

    Processing-structure-protrusion relationship of 3D Cu TSVs: control at the atomic scale

    Get PDF
    A phase-field-crystal model is used to investigate the processing-structure-protrusion relationship of blind Cu through-silicon vias (TSVs) at the atomic scale. A higher temperature results in a larger TSV protrusion. Deformation via dislocation motion dominates at temperatures lower than around 300∘C, while both diffusional and dislocation creep occur at temperatures greater than around 300∘C. TSVs with smaller sidewall roughness Ra and wavelength λa exhibit larger protrusions. Moreover, different protrusion profiles are observed for TSVs with different grain structures. Both protrusions and intrusions are observed when a single grain is placed near the TSV top end, while the top surface protrudes near both edges when it contains more grains. Under symmetric loading, coalescence of the grains occurs near the top end, and a symmetric grain structure can accelerate this process. The strain distributions in TSVs are calculated, and the eigenstrain projection along the vertical direction can be considered an index to predict the TSV protrusion tendency

    Protrusion of Cu-TSV under different strain states

    Get PDF
    A phase-field-crystal (PFC) model is used to investigate the protrusion of blind TSVs under different strain states. The direction of loading applied to the TSVs has an effect on the protrusion, which is closely related to the copper grains and their orientations at the TSV edges. A nonlinear relation between protrusion and strain rate has been found, which can be explained by different mechanisms of deformation. A higher strain occurring near the top end of the TSVs leads to a larger protrusion of the bind TSVs

    Beyond Reward: Offline Preference-guided Policy Optimization

    Full text link
    This study focuses on the topic of offline preference-based reinforcement learning (PbRL), a variant of conventional reinforcement learning that dispenses with the need for online interaction or specification of reward functions. Instead, the agent is provided with fixed offline trajectories and human preferences between pairs of trajectories to extract the dynamics and task information, respectively. Since the dynamics and task information are orthogonal, a naive approach would involve using preference-based reward learning followed by an off-the-shelf offline RL algorithm. However, this requires the separate learning of a scalar reward function, which is assumed to be an information bottleneck of the learning process. To address this issue, we propose the offline preference-guided policy optimization (OPPO) paradigm, which models offline trajectories and preferences in a one-step process, eliminating the need for separately learning a reward function. OPPO achieves this by introducing an offline hindsight information matching objective for optimizing a contextual policy and a preference modeling objective for finding the optimal context. OPPO further integrates a well-performing decision policy by optimizing the two objectives iteratively. Our empirical results demonstrate that OPPO effectively models offline preferences and outperforms prior competing baselines, including offline RL algorithms performed over either true or pseudo reward function specifications. Our code is available on the project website: https://sites.google.com/view/oppo-icml-2023
    • …
    corecore